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Abstract 

Regression-based adjusted plus-minus statistics were developed in bas- 
ketball and have recently come to hockey. The purpose of these statistics is 
to provide an estimate of each player's contribution to his team, independent 
of the strength of his teammates, the strength of his opponents, and other 
variables that are out of his control. One of the main downsides of the or- 
dinary least squares regression models is that the estimates have large error 
bounds. Since certain pairs of teammates play together frequently, collinear- 
ity is present in the data and is one reason for the large errors. In hockey, the 
relative lack of scoring compared to basketball is another reason. To deal 
with these issues, we use ridge regression, a method that is commonly used 
in lieu of ordinary least squares regression when collinearity is present in 
the data. We also create models that use not only goals, but also shots, Fen- 
wick rating (shots plus missed shots), and Corsi rating (shots, missed shots, 
and blocked shots). One benefit of using these statistics is that there are 
roughly ten times as many shots as goals, so there is much more data when 
using these statistics and the resulting estimates have smaller error bounds. 
The results of our ridge regression models are estimates of the offensive and 
defensive contributions of forwards and defensemen during even strength, 
power play, and short handed situations, in terms of goals per 60 minutes. 
The estimates are independent of strength of teammates, strength of oppo- 
nents, and the zone in which a player's shift begins. 
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1 Introduction 



Though the plus-minus statistic was first used in hockey, advanced versions of 
plus-minus have been developing more quickly in basketball. These new versions 
attempt to correct one or more of the problems associated with the traditional plus- 
minus statistic, which depends heavily on the strength of a player's teammates 
and opponents, and on other variables out of a player's control. Regression-based 
versions of adjusted plus-minus (APM) statistics for NBA players can be found 
in Winston (2009), Rosenbaum (2004), Lewin (2007), Witus (2008), Ilardi and 
Barzilai (2008), Sill (2010), and Fearnhead and Taylor (201 1). 

In Macdonald (2011a) and Macdonald (2011b), the author developed similar 
models for hockey. In Macdonald (201 la), the author used weighted least squares 
models similar to those in Rosenbaum (2004) and Ilardi and Barzilai (2008) to 
find the estimates of each player's offensive and defensive contribution during 
even strength situations, adjusted for the strength of his teammates and opponents. 
The contributions are given in terms of goals per 60 minutes and goals per season. 
Special teams situations are addressed in Macdonald (2011b). Information about 
the zone in which each shift begins was also used in Macdonald (201 lb) in order 
to get estimates that are independent of the zone on the ice in which a player 
typically begins his shifts. 

In many of the basketball articles, and also in the hockey articles Macdonald 
(2011a) and Macdonald (2011b), it was noted that one downside of the ordinary 
least squares regression models is that the results can have large error bounds, 
which are measures of uncertainty in the estimates. Since two main uses of these 
estimates could be (1) deciding which players to trade for and (2) establishing pa- 
rameters for salary negotiations, smaller errors, and hence more precise estimates, 
have significant value to NHL analysts and decision-makers. 

One reason for the large errors is the high collinearity in the data caused by 
teammates who play together frequently, a common occurrence in many sports. 
For example, Henrik and Daniel Sedin, twin brothers who play for the Vancouver 
Canucks, are almost always on the ice together. A regression model will have 
a difficult time telling them apart (both statistically and biologically) and their 
estimates tend to have large errors. In an extreme case where two players always 
play together, the ordinary least squares estimates will not even be unique. 

Another reason for the high errors in hockey is the relative lack of scoring 
when compared to a sport like basketball. A typical hockey team only scores two 
to four goals per game on average during a season. The low goal scoring rates, 
coupled with randomness and luck involved with goal scoring, makes it difficult 
to properly judge players using goals alone without using multiple seasons of 
data. Additionally, a player's goals for and goals against while he is on the ice 
is dependent on the quality of goalies on the ice. Ideally, one would prefer to 
estimate a player's abilities in a way that is independent of the quality of the 
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goalies he faces, and independent of the quality of goalies on his team. 
1.1 Brief summary of the new models 

In light of these observations, we make two modifications to the models given 
in Macdonald (2011a) and Macdonald (2011b). First, in lieu of ordinary least 
squares regression models, we use ridge regression models (Hoerl (1962), Hoerl 
and Kennard (1970)), similar to the model used for basketball in Sill (2010). Ridge 
regression frequently reduces the error bounds in the estimates and improves the 
predictive performance of the model when collinearly exists in the data. Ridge re- 
gression introduces bias in the estimates, but the tradeoff is typically worthwhile. 
The model is discussed in detail in Section 2. 

The second change we make is to form additional models that use three other 
statistics, in addition to goals, as the dependent variable. These additional mod- 
els use shots, Fenwick rating (shots plus missed shots), and Corsi rating (shots, 
missed shots, and blocked shots). These statistics were chosen because each of 
them has been shown to be very good indicators of performance at the team level 
(JLikens (2011), Ferrari (2009)). 

There are pros and cons to using these statistics, and that is one reason that 
we will use them in addition to goals and not instead of goals. For example, 
on one hand, shots, Fenwick rating, and Corsi rating ignore the shooting ability 
of players, although many hockey analysts would argue that a player's shooting 
ability is not nearly as significant as his ability to generate shots. Also, some 
would argue that missed shots and blocked shots should not be included or should 
not be considered good, since they are attempted shots that never reached the 
goal. However, if a team has more shots, missed shots, and blocked shots than 
their opponents, it is most likely an indication of a territorial advantage and an 
advantage in terms of puck possession. In order to take a shot, a player must 
possess the puck, and typically that player is also in the offensive zone. 

The relationships among goals, shots, Fenwick rating, and Corsi rating are 
described well in JLikens (2011) and discussed further in Macdonald (2012). In 
both articles, the authors show that shots, Fenwick rating, and Corsi rating are 
better indicators than goals of a team's future performance when one uses data 
from only half of a season. Based on this analysis, we believe the results based on 
shots, Fenwick rating and Corsi rating do have value, especially for our models 
that are based on only one season's worth of data. The reader can decide for him- 
or herself how much value those results have. 

One nice benefit of using of these additional statistics is that they are far more 
prevalent than goals. Typically, there are roughly 10 shots to every goal. The ex- 
tra data goes a long way to producing estimates with much smaller error bounds. 
Also, for the most part, those statistics are independent of goalies, so the strength 
of the goalies on a player's team will not have much of an affect on the estimates 
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of his contributions. When using goals, the estimate of a player's defensive con- 
tribution, in particular, can be positively or negatively affected by the performance 
of the goalie playing behind him. 

In order to more easily compare the results based on these additional statis- 
tics with the results based on goals, the new results were rescaled using league 
average shooting percentages. By shooting percentages we mean goals per shot 
(IS)' § oals P er Fenwick rating { Shots ^ ed Shots ), and goals per Corsi rating 

( shots+Missedfhots+BiockedShots )- Lea § ue averages of these shooting percentages 
were computed for even strength, power play, and short handed situations sepa- 
rately, using data from the last four full NHL seasons. 

The results based on shots, Fenwick rating, and Corsi rating were then rescaled 
by multiplying by the league average goals per shot, goals per Fenwick rating, and 
goals per Corsi rating, respectively. These results are in the units of expected goals 
per 60 minutes based on shots, Fenwick, or Corsi. A player's rescaled results 
based on shots can be thought of as his contribution to his team, in the units of 
expected goals per 60 minutes, based on shots for and shots against when he was 
on the ice. The results remain independent of the strength of his teammates, the 
strength of his opponents, and the zone in which his shifts begin. The rescaled 
results based on Fenwick rating and Corsi rating can be interpreted similarly. 

We use four separate ridge regression models for even strength situations using 
each of these four statistics (goals, shots, Fenwick rating, and Corsi rating) as the 
response variable. Each even strength model gives an even strength offensive and 
defensive component of APM for each player in terms of goals per 60 minutes or 
expected goals per 60 minutes. These components can be added to give a player's 
total contribution at even strength in terms of goals per 60 minutes. We also 
have four separate models for special teams situations, one for each of the four 
statistics. Each special teams model gives an offensive and defensive component 
on the power play, as well as an offensive and defensive component during short 
handed situations, in terms of goals per 60 minutes. In total, we get 36 estimates 
for each player in terms of goals per 60 minutes. If the results are expressed in 
terms of goals per season, then even strength, power play, and shorthanded results 
can be added to give estimates of offensive, defensive, and total contributions in 
all situations, in terms of goals per season. So, in this case, we get 48 different 
ratings for each player. This can be a bit of information overload, and when we 
present the results here, we will need to be selective regarding which components 
of APM are listed. Notation will be important as well. 

1.2 Notation 

Notation for the offensive, defensive, and total contribution of a player (forward or 
defensemen) during even strength, power play, and short handed situations, using 
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the model with goals as the response variable, is given in Table 1 . The adjusted 

Table 1: Summary of notation for APM results using goals. For each player (for- 
ward or defensemen), we have offensive, defensive, and total contributions during 
even strength, power play, and short handed situations, in terms of goals per sea- 
son. 



Strength Offense Defense Total 



Even strength 


--off 
U EV 


/-< def 
U EV 


Gev 


Power play 


G PP 


Upp 


Gpp 


Short handed 


G S H 


G S H 


Gsh 


All situations 


G off 


G def 


G 



plus-minus results based on shots, Fenwick rating and Corsi rating are denoted 
similarly, except with "5", "i 7 ", and "C", respectively, instead of the "G" that is 
used for goals. For example, the even strength offensive component of APM us- 
ing goals, shots, Fenwick rating, and Corsi rating are denoted G^l,S^ v ,F^y and 
C|y, respectively. The per 60 minute versions of these statistics are denoted simi- 
larly, but with a subscript of "60" included. For example, even strength offensive 
component of adjusted plus-minus per 60 minutes using goals is denoted G^ V6Q . 

1.3 Example of the results 

In Table 2, we give an example of the results. We list the top 10 players in offense 
during the 2007-08, 2008-09, 2009-10, and 2010-11 seasons according to G off , 
the offensive component of APM in terms of goals per season. We also list the 
players' offensive contributions according to the APM models based on the other 
statistics. 

Recall that 5 off , F off , and C off have been rescaled by multiplying by the league 
average goals per shot, goals per Fenwick rating, and goals per Corsi rating, re- 
spectively. Note that these statistics are in the units of goals per season or expected 
goals per season based on shots, Fenwick, or Corsi, so they do depend on playing 
time. Sidney Crosby, for example, has missed significant time in two of the four 
seasons, and that has a big impact on his rating, although he still leads the league 
in G off by a sizeable margin. 

Some per 60 minute results, G^y 60 and S^ v60 , along with their standard er- 
rors, are given in the last four columns of that table. These statistics are indepen- 
dent of playing time, so they do not depend on how much their coaches play them. 
They also do not depend on how much time these players have spent on injured 
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Table 2: The top 10 offensive players in the NHL according to G 





Player 


Pos 


Team 


G off 


goff 


po& 


c off 


/~<off 


Err 


coff 
°£V,60 


Err 


1 


Sidney Crosby 


C 


PIT 


23 


12 


13 


14 


0.83 


0.20 


0.42 


0.07 


2 


Jonathan Toews 


C 


CHI 


18 


8 


8 


9 


0.45 


0.20 


0.22 


0.07 


3 


Alex Ovechkin 


LW 


WSH 


17 


17 


20 


24 


0.46 


0.18 


0.45 


0.07 


4 


Daniel Sedin 


LW 


VAN 


16 


13 


13 


15 


0.47 


0.18 


0.44 


0.08 


5 


Joe Thornton 


C 


S.J 


16 


11 


11 


15 


0.34 


0.18 


0.26 


0.06 


6 


Nicklas Backstrom 


C 


WSH 


16 


11 


12 


14 


0.23 


0.19 


0.28 


0.07 


7 


Evgeni Malkin 


c 


PIT 


15 


11 


11 


12 


0.40 


0.20 


0.31 


0.06 


8 


Ryan Getzlaf 


c 


ANA 


15 


6 


8 


9 


0.31 


0.19 


0.07 


0.07 


9 


Pavel Datsyuk 


c 


DET 


15 


10 


11 


12 


0.53 


0.19 


0.27 


0.07 


10 Jason Spezza 


c 


OTT 


13 


7 


8 


9 


0.37 


0.21 


0.25 


0.07 



reserve. We believe that both the per season and per 60 minutes versions of these 
statistics have value, and we will continue to list both versions in our tables. 

In this paper, we will mostly give results based on models that contain data 
from four NHL seasons: 2007-08, 2008-09, 2009-10, and 2010-11. However, 
since we are now using ridge regression, single season results are stable enough to 
have value. One might prefer to see a player's progression from season to season 
rather than seeing a single number for all four years. Also, one might prefer to 
make adjustments so that the statistics for a player are relative to a replacement 
player at the same position. An example of Sidney Crosby 's APM statistics in each 
of the past four seasons, with adjustments for position and replacement players, is 
given in Table 3. We have also included his 4-year results for comparison. 



Table 3: Sidney Crosby's APM statistics over the past four seasons using goals. 



Year 


Age 


GP 


G off 


G def 


G 


U EV 


f def 


G EV 




z—def 
Gpp 


Gpp 


2007 


20 


53 


25 


9 


33 


19 


1 


26 


6 


2 


1 


2008 


21 


77 


30 


3 


33 


21 


2 


23 


9 


1 


10 


2009 


22 


81 


37 


4 


41 


31 


2 


34 


5 


2 


7 


2010 


23 


41 


17 


1 


18 


13 





13 


3 


1 


5 


4-yr 


20-23 


63 


29 


1 


30 


23 


1 


24 


5 


1 


6 



We also note that the errors in our estimates are lower than those reported in 
Macdonald (201 la) and Macdonald (201 lb), where the author used ordinary least 
squares (OLS) regression instead of ridge regression. As an example, we give 
Alex Ovechkin's even strength offensive contributions per 60 minutes in Table 4, 
along with their standard errors. The errors in Ovechkin's G°^ 60 are smaller than 
those reported in Macdonald (201 la) and Macdonald (201 lb). Also, the errors in 
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Ovechkin's Sg V6Q , F^ V6Q , and Cf v 60 are smaller than the errors in G° EV6Q . This 
trend can also be seen in Table 2. The standard errors in S^ v60 are much lower 
than the standard errors in G°^ v 60 for all of the players in that table. The errors are 
still not small enough to be ignored, as the confidence intervals of many of the es- 
timates still overlap. Nevertheless, the APM estimates with smaller error bounds, 
coupled with the additional APM estimates based on shots, Fenwick rating, and 
Corsi rating, are useful metrics with which to analyze the performance of NHL 
players. 



Table 4: Alex Ovechkin's EV offense statistics, with standard errors. 



Player Pos 


Team 


r off 


Err 


coff 


Err 


FOff 

r £V,60 


Err 


/-off 
^EV,60 


Err 


Alex Ovechkin LW 


WSH 


0.46 


0.18 


0.45 


0.07 


0.53 


0.06 


0.63 


0.05 



The rest of this paper is organized as follows. First, we describe the ridge 
regression models in detail in Section 2. In Section 3, we give the players that 
APM determines as the Hart Trophy, Norris Trophy, and Selke Trophy finalists 
(most valuable player, best defensemen, and best defensive forward, respectively) 
during the 2007-08, 2008-09, 2009-10, and 2010-11 seasons combined. We finish 
with conclusions and future work in Section 4. In the Appendix, we give a brief 
comparison of ordinary least squares and ridge regression, and describe how we 
chose our ridge parameter in our ridge regression models. 

2 Ridge Regression Model 

We now describe the setup of our model. We use information about the players 
on the ice during every shift of every game during the 2007-08, 2008-09, 2009- 
10, and 2010-11 seasons, as well as the outcome of each shift. We divide this 
data into even strength and special teams situations, and we remove empty net 
situations from both data sets. Each shift gives two lines of data: one line cor- 
responding to the goals per 60 minutes scored by the home team, and one line 
corresponding to the goals per 60 minutes scored by the away team. Both of these 
observations are weighted by the duration of that shift. We denote the total num- 
ber of observations by N. For even strength, we have N = 2,324,528, while for 
special teams situations, we have N = 461 , 022. We note that the average duration 
of a shift is 4.5 seconds longer for special teams than for even strength. Other ob- 
servations about shift lengths and ice time for players, along with accompanying 
figures, can be found in Figures 4 and 5 the Appendix. 

Let J denote the number of players in the league, let y denote the goals (or 
shots, Fenwick rating, or Corsi rating) per 60 minutes during an observation, and 
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let Xj and D\ be indicator variables that are defined as follows: 

skater j is on offense during the observation; 
skater j is not playing or is on defense during the observation; 

skater j is on defense during the observation; 
skater j is not playing or is on offense during the observation; 

(1) 

where 1 < j < J. Note that by "skater" we mean a forward or a defensemen, 
but not a goalie. We also note that for the models which use goals, we included 
defensive variables for goalies. Let Z ff and Z de f be indicator variables defined 
as follows: 

the observation corresponds to a shift that begins with a faceoff in 
the offensive zone, 
otherwise 

the observation corresponds to a shift that begins with a faceoff in 
the defensive zone, 
otherwise 

(2) 

To clarify, we give an example. Suppose that in one shift, skaters 1-5 are on 
the ice for the home team, and skaters 6-10 are on the ice for the away team. 
Suppose that this is a shift of duration t\ seconds, and that the home team scores a 
goal during this shift. For this shift we would have two lines of data, one for goals 
per 60 minutes scored by the home team, and the second for goals per 60 minutes 
scored by the away team. These two rows of data would look like this: 

X = [\ 1 1 1 1 000000 •••0],D= [000001 1 1 1 10 •••0], y 
X = [00000 1 1 1 1 1 0---0],D= [1 1 1 1 1 000000 •••0], y 

We note that ^ is m the units of goals per second, so we multiple by 3,600 to get 
goals per 60 minutes. For even strength situations, we start with the following 
linear model: 

y = Po + PlXl +---+faXj + SlDi + • • • + 8jDj + CoffZoff + CdefZ de f. (3) 

The quantities of interest are the /3/s and SjS, which are player j's offensive and 
defensive contributions, respectively, in terms of goals per 60 minutes. The coef- 
ficients C, Q ff and C,tief can be regarded as estimates of the value of starting a shift 
in the offensive or defensive zone, respectively, in terms of goals per 60 minutes. 



Hi 




3600, 



= - • 3600. 
h 
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For special teams situations, we start with a model that is similar to (3) and is 
described in Macdonald (2011b). In total, there are 8 models: an even strength 
model and a special teams model for each of the four statistics. 

A linear model like (3) can also be expressed as a system of linear equations 
in matrix form as 

y = Zj8, (4) 

where y is an N x 1 vector of response variables, X is an N x (27 + 3) matrix 
of the explanatory variables, and /3 is an (27 + 3) x 1 vector of coefficients, the 
quantities we are interested in. Typically, when the number of observations, N, is 
much greater than the number of explanatory variables, 27 + 3, no solution to (4) 
exists, and one must find some sort of "best fit" solution. 

Instead of using OLS as in Macdonald (2011a) and Macdonald (2011b) to 
find the best fit, we use ridge regression. For the sake of those readers who are 
unfamiliar with ridge regression, we give a brief description of how to find the best 
fit estimates using OLS regression and ridge regression, and how the two methods 
are related, in the Appendix. We also discuss how we chose the ridge parameter 
A in that section. 

2.1 Differences in OLS and Ridge Estimates 

The effect that this ridge parameter A has on the estimates can be seen in Figure 
1. In this example, we plot the estimated coefficient for G p f p 60 (offensive contri- 
bution on the power play in terms of goals per 60 minutes) of a few players in 
the league for different choices of A . Note that when A = 0, Pavel Datsyuk (solid 
red line) actually has a negative estimate, and Brandon Dubinsky (solid blue line) 
has a very high positive estimate in line with the league's elite offensive players. 
Dubinsky is a valuable offensive player, but one would not expect his rating to 
be that much higher than Datsyuk's rating or among the league's elite. Also, we 
would not consider Datsyuk to be a below average player on the power play. We 
note that A = corresponds to the ordinary least squares estimates, so these are 
the estimates we would have gotten for Dubinsky and Datsyuk if we had not used 
ridge regression. 

However, notice that for larger choices of A, the estimates begin to stabi- 
lize. Datsyuk's estimate moves towards the estimates of the league's elite players, 
while Dubinsky's estimate returns to a more reasonable level. These estimates 
agree with most people's intuition that Datysuk is an elite offensive player, while 
Dubinsky is an above average offensive player, but not an elite player as his ordi- 
nary least squares estimate suggested. 

For Dubinsky, the unexpected result for A = is probably due to minimal 
playing time. For Datsyuk, it is probably due to the fact that he spent 90% of his 
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Estimates of Gpp vs X 



Pavel Datsyuk C 

— - Sidney Crosby C 

Henrik Sedin C 




0.0 0.5 1.0 1.5 2.0 

X 



Figure 1: Estimates of Gpp 60 for some players, for different values of X. 

power play time with one of his teammates, Nicklas Lidstrom. While Datsyuk's 
estimate starts below zero for X = and increases as X increases, Lidstrom's 
estimate (dotted and dashed, light blue line) is off the figure near 4.0 for X = 0, 
and rapidly decreases as X increases. While we would expect Lidstrom to have 
a good offensive rating on the power play, 4.0 is unusually high, and the ridge 
regression seems to be tempering Lidstrom's estimate while correcting Datsyuk's. 

Datsyuk and Dubinsky are not the only players whose estimates exhibit this 
behavior. We give the tracecurves of the 25 players whose coefficients were the 
most positively (resp. negatively) affected by the ridge regression as the dotted 
(resp. solid) lines in Figure 2. In many cases, there are drastic changes in a 
player's value relative the other players in the league. A player may be worth 1 
goal per 60 minutes more than another player according to their OLS estimates 
(A = 0) but worth 0.5 goals per 60 minutes less according to their ridge estimates 
(X =0.5). 

2.2 Year-to-year correlations 

We note that the ridge estimates tend to be more consistent from year to year than 
the OLS estimates. In Figure 3 we give three examples of year-to-year corre- 
lations for three of the components of APM. In the left figure, we see that our 
ridge estimates for offense at even strength using goals tend to be more consistent 
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Estimates of Gpp vs X 



Risers 
Fallers 

1st percentile 
99th percentile 




Figure 2: Estimates of Gpp 60 for different values of A for the 25 players whose 
coefficients were the most positively (resp. negatively) affected by the ridge re- 
gression, plotted as dashed (resp. solid) lines. 

than the corresponding OLS estimates from Macdonald (201 la) (which also used 
goals). Also, the ridge estimates that use shots, Fenwick, and Corsi tend to be 
more consistent than the ridge estimates that use goals. 

In the middle figure, we see that these trends are true for power play offense 
as well. For short handed defense, the ridge estimates using goals are not more 
consistent than the OLS estimates, although the correlations for shots, Fenwick 
and Corsi are still higher. We note that, in general, the even strength estimates 
tend to have higher year-to-year correlations than the power play and short handed 
estimates. This trend is expected, since there is much less data for special teams 
situations than for even strength situations. 



3 Results 

We now consider performance during the 2007-08, 2008-09, 2009-10, and 2010- 
1 1 seasons combined and determine the "four- year" Selke Trophy finalists (best 
defensive forwards), Norris Trophy finalists (best defensemen), and Hart Trophy 
finalists (most valuable players), according to APM. Although the NHL typically 
announces three finalists for each trophy, we will give our top 5 finalists for each 
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Year-to-year correlations of APM estimates Year-to-year correlations of APM estimates Year-to-year correlations of APM estimates 
(EV offense) (PP offense) (SH defense) 



0.22 0.21 


0.26 













OLS Goals Shots Fenwick Corsi 



OLS Goals Shots Fenwick Corsi 



OLS Goals Shots Fenwick Corsi 



Figure 3: A comparison of year-to-year correlations for OLS estimates (OLS) 
and our new ridge regression estimates (Goals, Shots, Fwick, Corsi). (Left) Even 
strength offense, minimum 500 minutes. (Middle) Power play offense, minimum 
150 minutes. (Right) Short handed defense, minimum 150 minutes. To compute 
the correlations, the per 60 minutes versions of these statistics were used. 

award and discuss other notable players. 

3.1 Four-year Selke Trophy finalists for Best Defensive For- 
ward 

Each season, the Selke Trophy is awarded to the forward that "best excels at the 
defensive aspects of the game" NHL.com (2010). In practice, the award winner is 
typically a great defensive forward who contributes offensively as well. In Table 
5, we give the top defensive forwards in the league during the 2007-08, 2008- 
09, 2009-10, and 2010-11 seasons according to G def . Recall that 5 off , F off , C off 



Table 5: The top 5 defensive forwards according to G 



def 



Player 


Pos 


Team 


G def 


S def 


p def 


C def 


/'-•def 
^EVfiQ 


Err 


cdef 
°£V,60 


Err 


Pavel Datsyuk 


C 


DET 


12 


8 


7 


6 


0.44 


0.19 


0.30 


0.07 


David Rrejci 


C 


BOS 


11 


3 


2 





0.52 


0.20 


0.18 


0.07 


Chris Higgins 


LW 


VAN 


10 





-1 


-2 


0.33 


0.21 


0.03 


0.07 


Tomas Plekanec 


C 


MTL 


10 


-0 


-1 


-3 


0.30 


0.20 


0.03 


0.07 


Mikko Koivu 


C 


MIN 


9 


3 


3 


3 


0.53 


0.21 


0.20 


0.08 



have been rescaled by multiplying by the league average goals per shot, goals per 
Fenwick rating, and goals per Corsi rating, respectively. Recall that these statistics 
are in the units of goals per season or expected goals per season based on shots, 
Fenwick, or Corsi. 

Pavel Datsyuk seems to be the clear choice as the best defensive forward in 
the NHL according to APM. He is the league leader in all 4 flavors of defensive 
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contribution, and is also the best offensive player on the list. The voters seem to 
agree: Datsyuk was awarded the Selke Trophy in 2007-08, 2008-09, and 2009- 
10, and he was a finalist in 2010-11. Tomas Plekanec and Chris Higgins are 
on this list, but one might consider the next best candidates to be David Krejic 
and Mikko Koivu due to their superior ability to reduce the opposition's shots, 
Fenwick rating and Corsi rating. Interestingly, multi-year finalist and 2010-11 
winner Ryan Kesler is not on this list, although he did have very good numbers in 
2010-11. We note that 5 other players were tied with Koivu for 5th in G def with 
9 to round out the top 10 in that category. Daymond Langkow, who was 1 1th in 
G def with 8, missed the top 10 in G def , but was second in S def , and third in both 
F def and C def . In light of those rankings, Langkow could be considered one of the 
best defensive forwards in the game. 

3.2 Four-year Norris Trophy finalists for Best Defensemen 

The James Norris Memorial Trophy is given each year to the defensemen who 
"demonstrates throughout the season the greatest all-round ability in the position" 
NHL.com (2010). In Table 6, we give the top defensemen in the league during 
the 2007-08, 2008-09, 2009-10, and 2010-11 seasons according to G. It is not too 



Table 6: The top 5 defensemen during the last 4 seasons, according to G. 



Player 


Pos 


Team 


G 


S 


F 


C 


°£V,60 


coff 
°EV,60 


-^roff 

^PPfiQ 


Err 


coff 
°PP.60 


Err 


Zdeno Chara 


D 


BOS 


19 


9 


9 


10 


0.10 


0.21 


0.43 


0.33 


0.58 


0.07 


Nicklas Lidstrom 


D 


DET 


19 


1 


3 


5 


-0.06 


0.07 


1.37 


0.26 


0.71 


0.05 


Brian Campbell 


D 


CHI 


14 


7 


7 


8 


0.12 


0.16 


0.23 


0.36 


0.32 


0.08 


Andrei Markov 


D 


MTL 


13 


-3 


-1 


1 


0.20 


0.13 


1.75 


0.37 


0.59 


0.08 


Brian Rafalski 


D 


DET 


13 


9 


8 


11 


-0.02 


0.22 


0.84 


0.30 


0.65 


0.06 



surprising that Zdeno Chara and Nicklas Lidstrom were the best defenseman in 
the NHL during those seasons according to G. Zdeno Chara's APM results based 
on shots, Fenwick rating, and Corsi rating are better than those of Lidstrom, so 
one might choose to select him as the best defenseman. Brian Campbell and 
Brian Rafalski are both strong across the board. Interestingly, Andrei Markov 
does not rate very well in the APM estimates based on shots, Fenwick rating, 
and Corsi rating. One might prefer to include Chris Pronger, Dan Boyle, or Kris 
Letang instead of Markov on this list due to their ratings in 5, F and C. Boyle, for 
example, led the league in S, F and C. 
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3.3 Four-year Hart Trophy finalists for Most Valuable Player 



The Hart Memorial Trophy is given each year to the player "judged to be the most 
valuable to his team" NHL.com (2010). Since APM is not computed for goalies, 
we restrict our attention to only forwards and defensemen. Typically, the Hart 
Trophy winner is a forward, in part because defensemen and goalies already have 
a trophy dedicated to the best player at those positions. In Table 7, we list the top 
5 players in the league according to G. According to G, Pavel Datsyuk was the 



Table 7: The top 5 players according to G. 

Pos Team G S F C Gf vm Sf v60 Gf m Err Sf m En- 



Player 

Pavel Datsyuk C DET 27 18 17 18 

Jonathan Toews C CHI 24 11 10 11 

AlexOvechkin LW WSH 24 18 19 23 

Daniel Sedin LW VAN 23 16 16 17 

Sidney Crosby C PIT 22 11 12 12 



0.53 0.27 0.77 0.31 0.70 0.06 

0.45 0.22 1.67 0.34 0.79 0.07 

0.46 0.45 0.87 0.26 0.84 0.05 

0.47 0.44 1.11 0.26 0.73 0.06 

0.83 0.42 0.98 0.29 0.58 0.06 



most valuable player in the league during the four seasons in question thanks to 
his excellent two-way play. Datsyuk is also tied for first in S and is third in F and 
C. 

Given the number of shots that Ovechkin throws at the net, it is not surpris- 
ing that he is the leader in S,F, and C, as well as the corresponding offensive 
components 5 off ,F off , and C off . Ovechkin and Daniel Sedin have each won the 
Hart Trophy during the past four years, while Jonathan Toews has been a con- 
sistently excellent two-way player. Toews has been a Selke finalist and a Conn 
Smythe trophy winner for the best player in the playoffs. Unfortunately, Crosby 
missed significant time because of injury in two of the seasons that are used in 
this model. Despite the injuries, Crosby still rates as the top offensive player in 
the league according to G off , as we saw earlier in Table 2. 



4 Conclusions and Future Work 

The use of ridge regression, and the addition of adjusted plus-minus models based 
on shots, Fenwick rating, and Corsi rating, are two valuable modifications of the 
earlier APM models in hockey. Other modifications could prove useful as well. 
Different estimation techniques, such as that in Thomas et al. (2012), could be 
used. Different outcome variables could also be used. 

For example, one could also consider using weighted shots as the response 
variable in an APM model. By "weighted shots" we mean the following. We 
could estimate the probability that a shot on goal will be a goal using distance, 
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type of shot, and other details as explanatory variables. Such shot quality models 
have been developed by Ken Krzywicki in Krzywicki (2005), Krzywicki (2009), 
and Krzywicki (2010) and Michael Schuckers in Schuckers (2011). Then, each 
shot can be weighted based on the probability that it will be a goal. In a forth- 
coming article Macdonald et al., the authors create a shot quality model similar 
to Krzywicki's logistic regression models, and use the resulting weighted shots as 
the outcome variable in a ridge regression model similar to the one described in 
this paper. The results of this model are estimates of W, an adjusted plus-minus 
rating based on weighted shots. 

Also, recall that Fenwick rating and Corsi rating are combinations of shots, 
missed shots, and blocked shots, and are a good indication of possession advan- 
tage and team performance in general. One could build on the idea of using those 
statistics and consider other statistics like hits, faceoffs, and zone starts as well. 
In Macdonald (2012), the author estimates the combinations of these statistics are 
the best predictors of goal scoring at the team level. The results of the model 
can be interpreted as "expected goals". These expected goals are then used as the 
outcome in a ridge regression similar to the model described in this paper. The 
results are estimates of E, an adjusted plus-minus rating based on expected goals. 
Another approach that uses several different statistics can be found in Schuckers 
etal. (2011). 

We hope that the ideas presented in this paper will be useful to fans, analysts, 
coaches and teams as they analyze the performance of NHL players, and will 
inspire future work in performance analysis in hockey. 
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5 Appendix 

5.1 Ordinary Least Squares 

To find the "best fit" solution of (4) using ordinary least squares (OLS) regression, 
one finds the /3 7 s, 8js, and £s that minimize the sum of squared error 

e = X>-f ; ) 2 , (5) 

where fj is the predicted outcome for observation i and is given by 

fi = fr> + jSjXy + ■ • ■ + PjXjj + diD^ + ■■■ + djDjj + ZoffZoffj + QefZ de f,i. 

(6) 

In matrix notation, the sum of squared error Q in (5) can be written 

Q=(y-XP) T (y-Xl3), (7) 

where (y — Xj5) T denotes the transpose of y — Xfi. Equivalently, finding the least 
squares estimates of /3 amounts to finding the /3 that solves the system 

X T XP=X T y, (8) 
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which is obtained by multiplying both sides of (4) by X T on the left. When there 
is only one predictor variable, finding j8 can be thought of as finding the line that 
best fits the data. With two predictor variables, one finds the plane that best fits 
the data. With more than two variables, the case we have in this paper, one finds 
the best fit hyperplane. 

If the kernel (or nullspace) of X is 0, which is typically true when N >> J, 
then X T X is invertible, and we can solve for j8 by multiplying both sides of (8) by 
(X T Xy l on the left, giving 

= (X T X)- l X T y. 

Further details about OLS from a linear algebraic point of view can be found in 
most standard undergraduate linear algebra textbooks (for example, Strang (1988), 
Bretscher (2009), or Lay (2006)) or a multiple linear regression textbook (for 
example, Kutner et al. (2004)). 

Ordinary least squares was the approach taken in Macdonald (2011a) and 
Macdonald (201 lb) and several of the basketball articles. Unfortunately, collinear- 
ity in X results in high standard errors for j8 . A linear algebraist might prefer the 
viewpoint that if two teammates play together often, then two columns of X are 
nearly the same, the columns of X are nearly linearly dependent, and the corre- 
sponding columns (and rows) X T X are nearly linearly dependent, which means 
that X T X is nearly singular and has a high condition number. A high condition 
number means that solutions to (8) are sensitive to small changes in the data, an 
undesirable property. It also leads to large standard errors in the estimates of /3 . 

5.2 Ridge Regression 

In ridge regression, instead of finding the /3 that minimizes (7), one standardizes 
the columns of X and finds the /3 that minimizes 

Q = (y-XP) T (y-XP)+Xp T p (9) 

where A is a ridge parameter that needs to be chosen. Note that (9) is similar to 
(7) but with the penalty term A/3 r /3 included. Equivalently, instead of solving (8) 
for /3, one solves the equation 

(X T X + Xl)p=X T y, (10) 

for /3, where / denotes the identity matrix. Note that (1 1) is similar to (8) but with 
the penalty term XI included. 

To solve (11), one multiples both sides of the equation by (X T X + A/) -1 on 
the left, which gives 

/3 = (X T X + Xiy l X T y. (11) 
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These estimates /3 are the estimates that we use in the next section to evaluate 
players. The interpretation of j8 is the same for ridge regression as it was with 
OLS regression. In our case, coefficients j8 are estimates of the offensive and 
defensive contributions of players in terms of goals per 60 minutes, independent 
of the strength of their teammates and opponents, and independent of the zone in 
which their shifts begin. 

The effect of the penalty term is to penalize large values for the coefficients /3 . 
Ridge regression can be thought of like OLS regression, which finds the "best fit" 
hyperplane, but with constraints on the coefficients /3 that prevent them from being 
poorly behaved. Note that for the choice X =0, (9) becomes (7), and (1 1) becomes 
(8), so A = in ridge regression corresponds to the ordinary least squares esti- 
mates, where the coefficients are unconstrained and may have high error bounds. 
As X increases, the coefficients tend to stabilize and move toward zero. 

We remark that including the penalty term XI in (11) can seem somewhat 
ad hoc or arbitrary, but fortunately there is a nice Bayesian justification for this 
approach. The ridge regression model (1 1) is equivalent to a Bayesian regression 
model in which the coefficients /3 are given a normal prior distribution with mean 
and a variance that depends on X. Changing X corresponds to changing how 
influential the mean prior will be on the value of the estimates. From a linear 
algebra perspective, the term XI is effectively padding the diagonal of XX, which 
lowers its condition number, and makes the solutions /3 less volatile. 

5.3 Choosing X 

Often, the ridge parameter X is chosen using cross-validation. With large data, 
specifically when n, the number of rows, is large, computing X in this way can 
be computationally expensive, as it requires one to compute n leave-one-out es- 
timates. Another alternative is generalized cross-validation (GCV), which is also 
computationally expensive. To see why, consider the hat matrix 

H =X(X T X + Xiy 1 X T . (12) 

Finding H, or the trace of H, is a required step for GCV. If X has n rows, then H 
is an n-by-n matrix. For our even strength model, for example, we have well over 
1,000,000 rows, meaning H is a 1,000,000 by 1,000,000 matrix. 

In our work, we use an estimate of the trace of H to get a randomized version 
of GCV simliar to that in Girard (1991). This method uses the following lemma 
given in Hutchinson (1989): 

Lemma (Hutchinson (1989)). Let B be an n x n symmetric matrix and let u = 
(u\, . . . ,u n ) T be a vector of n independent samples from a random variable U 
with mean and variance o 2 . Then, 

E(e T Be) =a 2 tx(B). (13) 
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Note that E(-) denotes expectation and tr(-) denotes the trace of a matrix. The 
hat matrix H is symmetric, so the lemma applies. The lemma is useful because 
^e T He is easier to compute than tr(H), and -^e 7 He is an unbiased estimate 
for tr(H) according to the lemma. Also, the estimate is very accurate (see, for 
example Girard (1991) or Hutchinson (1989)). 

Note that using (13) and (12) we can write 

tr(H) « ^e T He = -\e T [X(X T X + Xiy l X T ]e (14) 

and since matrix multiplication is associative, we can group the terms in 

e T [X{X T X + Xl)- l X T ]e 
in any order. We can write (14) as 

tr(f/) « ^{e T X)(X T X + Xl)-\x T e). (15) 

Note that if X is an n x p matrix and e is an n x 1 matrix, then 

e T X is a 1 x p matrix, 
X X + XI is a. px p matrix, and 
X T e is a p x 1 matrix, 

so our biggest matrix is p x p. Since typically p « n when n is very large, it is 
much easier to work with a p x p matrix than an n x n matrix. In our case, for 
example, n is on the order of 1,000,000, while p is on the order of only 1,000. We 
used the estimate for the trace in (15) to obtain a randomized GCV choice for X 
as in Girard (1991). 

In some cases, we preferred to increase the value of X obtained by this method. 
This change can be justified in several ways. For example, in some cases, inspec- 
tion of the trace curves (that is, the curves like those in Figure 1) revealed that 
the estimates did not yet appear to be stabilized at those values of X, and this 
observation can be used to justify increasing X. We also considered the Hoerl- 
Kannard-Baldwin estimate Hoerl et al. (1975) of X. The Hoerl-Kennard-Baldwin 
estimate is given by 

Xhkb = — — —J (16) 
/3 r /3 

where MSE denotes mean-squared error. Finally, we considered variance infla- 
tion factors (VIF), which quantify the level of collinearity present in the data, 
when choosing X. As stated in Marquardt (1970), the VIF can be expressed as the 
diagonal elements of 

(x T x+xiy 1 x T x(x T x+xiy l . (17) 
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Typically, values in the single digits are preferred. Often the VIF were high for 
the values of X that we got using GCV. We chose X at least high enough so that 
the VIF were below 10. 

These four pieces of information were considered when choosing X for each 
of our 8 models that used 4 seasons of data from the 2007-08 through 2010-2011 
seasons. We also used this information with models that only used a single sea- 
son's worth of data, giving 8 more values of X for each season. In each case, we 
chose the highest value of X suggested by these four methods. These values of 
X were used in (1 1) to obtain estimates of the coefficients in each of our models. 
The vertical line at X = 0.5 in Figure 1 indicates the value of X that we chose for 
that model. Note that the estimates seem to have stabilized for the most part by 
the time X reaches 0.5. 

5.4 Supplemental figures 



Histogram of shift lengths on the power play Histogram of shift lengths when a PP goal is scored 




Shift length in seconds Shift length in seconds 



Figure 4: A comparison of the shift lengths during power play situations for all 
shifts (left) and only shifts during which a goal is scored (right). Typically, shift 
lengths are longer for the shifts when a goal is scored. This observation is similar 
to that made by (Thomas et al., 2012, Figure 6) for even strength situations. 



23 



Histogram of even strength ice time for players Histogram of power play ice time for players 
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Figure 5: Distribution of players' ice time during even strength (left) and power 
play (right) situations. The small grouping of players with more than 10,000 min- 
utes of even strength playing time are all goalies. 
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